Goto

Collaborating Authors

 character animation


Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions

Abootorabi, Mohammad Mahdi, Ghahroodi, Omid, Zahraei, Pardis Sadat, Behzadasl, Hossein, Mirrokni, Alireza, Salimipanah, Mobina, Rasouli, Arash, Behzadipour, Bahar, Azarnoush, Sara, Maleki, Benyamin, Sadraiye, Erfan, Feriz, Kiarash Kiani, Nahad, Mahdi Teymouri, Moghadasi, Ali, Abianeh, Abolfazl Eshagh, Nazar, Nizi, Rabiee, Hamid R., Baghshah, Mahdieh Soleymani, Ahmadi, Meisam, Asgari, Ehsaneddin

arXiv.org Artificial Intelligence

Generative AI is reshaping art, gaming, and most notably animation. Recent breakthroughs in foundation and diffusion models have reduced the time and cost of producing animated content. Characters are central animation components, involving motion, emotions, gestures, and facial expressions. The pace and breadth of advances in recent months make it difficult to maintain a coherent view of the field, motivating the need for an integrative review. Unlike earlier overviews that treat avatars, gestures, or facial animation in isolation, this survey offers a single, comprehensive perspective on all the main generative AI applications for character animation. We begin by examining the state-of-the-art in facial animation, expression rendering, image synthesis, avatar creation, gesture modeling, motion synthesis, object generation, and texture synthesis. We highlight leading research, practical deployments, commonly used datasets, and emerging trends for each area. To support newcomers, we also provide a comprehensive background section that introduces foundational models and evaluation metrics, equipping readers with the knowledge needed to enter the field. We discuss open challenges and map future research directions, providing a roadmap to advance AI-driven character-animation technologies. This survey is intended as a resource for researchers and developers entering the field of generative AI animation or adjacent fields. Resources are available at: https://github.com/llm-lab-org/Generative-AI-for-Character-Animation-Survey.


Diffuse-CLoC: Guided Diffusion for Physics-based Character Look-ahead Control

Huang, Xiaoyu, Truong, Takara, Zhang, Yunbo, Yu, Fangzhou, Sleiman, Jean Pierre, Hodgins, Jessica, Sreenath, Koushil, Farshidian, Farbod

arXiv.org Artificial Intelligence

We present Diffuse-CLoC, a guided diffusion framework for physics-based look-ahead control that enables intuitive, steerable, and physically realistic motion generation. While existing kinematics motion generation with diffusion models offer intuitive steering capabilities with inference-time conditioning, they often fail to produce physically viable motions. In contrast, recent diffusion-based control policies have shown promise in generating physically realizable motion sequences, but the lack of kinematics prediction limits their steerability. Diffuse-CLoC addresses these challenges through a key insight: modeling the joint distribution of states and actions within a single diffusion model makes action generation steerable by conditioning it on the predicted states. This approach allows us to leverage established conditioning techniques from kinematic motion generation while producing physically realistic motions. As a result, we achieve planning capabilities without the need for a high-level planner. Our method handles a diverse set of unseen long-horizon downstream tasks through a single pre-trained model, including static and dynamic obstacle avoidance, motion in-betweening, and task-space control. Experimental results show that our method significantly outperforms the traditional hierarchical framework of high-level motion diffusion and low-level tracking.


Zero-shot High-fidelity and Pose-controllable Character Animation

Zhu, Bingwen, Wang, Fanyi, Lu, Tianyi, Liu, Peng, Su, Jingwen, Liu, Jinxiu, Zhang, Yanhao, Wu, Zuxuan, Qi, Guo-Jun, Jiang, Yu-Gang

arXiv.org Artificial Intelligence

Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.


PDP: Physics-Based Character Animation via Diffusion Policy

Truong, Takara E., Piseno, Michael, Xie, Zhaoming, Liu, C. Karen

arXiv.org Artificial Intelligence

Generating diverse and realistic human motion that can physically interact with an environment remains a challenging research area in character animation. Meanwhile, diffusion-based methods, as proposed by the robotics community, have demonstrated the ability to capture highly diverse and multi-modal skills. However, naively training a diffusion policy often results in unstable motions for high-frequency, under-actuated control tasks like bipedal locomotion due to rapidly accumulating compounding errors, pushing the agent away from optimal training trajectories. The key idea lies in using RL policies not just for providing optimal trajectories but for providing corrective actions in sub-optimal states, giving the policy a chance to correct for errors caused by environmental stimulus, model errors, or numerical errors in simulation. Our method, Physics-Based Character Animation via Diffusion Policy (PDP), combines reinforcement learning (RL) and behavior cloning (BC) to create a robust diffusion policy for physics-based character animation. We demonstrate PDP on perturbation recovery, universal motion tracking, and physics-based text-to-motion synthesis.


Taming Diffusion Probabilistic Models for Character Control

Chen, Rui, Shi, Mingyi, Huang, Shaoli, Tan, Ping, Komura, Taku, Chen, Xuelin

arXiv.org Artificial Intelligence

We present a novel character control framework that effectively utilizes motion diffusion probabilistic models to generate high-quality and diverse character animations, responding in real-time to a variety of dynamic user-supplied control signals. At the heart of our method lies a transformer-based Conditional Autoregressive Motion Diffusion Model (CAMDM), which takes as input the character's historical motion and can generate a range of diverse potential future motions conditioned on high-level, coarse user control. To meet the demands for diversity, controllability, and computational efficiency required by a real-time controller, we incorporate several key algorithmic designs. These include separate condition tokenization, classifier-free guidance on past motion, and heuristic future trajectory extension, all designed to address the challenges associated with taming motion diffusion probabilistic models for character control. As a result, our work represents the first model that enables real-time generation of high-quality, diverse character animations based on user interactive control, supporting animating the character in multiple styles with a single unified model. We evaluate our method on a diverse set of locomotion skills, demonstrating the merits of our method over existing character controllers. Project page and source codes: https://aiganimation.github.io/CAMDM/


Discovering Fatigued Movements for Virtual Character Animation

Cheema, Noshaba, Xu, Rui, Kim, Nam Hee, Hämäläinen, Perttu, Golyanik, Vladislav, Habermann, Marc, Theobalt, Christian, Slusallek, Philipp

arXiv.org Artificial Intelligence

Virtual character animation and movement synthesis have advanced rapidly during recent years, especially through a combination of extensive motion capture datasets and machine learning. A remaining challenge is interactively simulating characters that fatigue when performing extended motions, which is indispensable for the realism of generated animations. However, capturing such movements is problematic, as performing movements like backflips with fatigued variations up to exhaustion raises capture cost and risk of injury. Surprisingly, little research has been done on faithful fatigue modeling. To address this, we propose a deep reinforcement learning-based approach, which -- for the first time in literature -- generates control policies for full-body physically simulated agents aware of cumulative fatigue. For this, we first leverage Generative Adversarial Imitation Learning (GAIL) to learn an expert policy for the skill; Second, we learn a fatigue policy by limiting the generated constant torque bounds based on endurance time to non-linear, state- and time-dependent limits in the joint-actuation space using a Three-Compartment Controller (3CC) model. Our results demonstrate that agents can adapt to different fatigue and rest rates interactively, and discover realistic recovery strategies without the need for any captured data of fatigued movement.


Researchers' AI can perform 3D motion capture with any off-the-shelf camera

#artificialintelligence

But researchers at the Max Planck Institute and Facebook Reality Labs claim they've developed a machine learning algorithm -- PhysCap -- that works with any off-the-shelf DSLR camera running at 25 frames per second. In a paper expected to be published in the journal ACM Transactions on Graphics in November 2020, the team details what they say is the first of its kind for real-time, physically plausible 3D motion capture that accounts for environmental constraints like floor placement. PhysCap ostensibly achieves state-of-the-art accuracy on existing benchmarks and qualitatively improves stability at training time. Motion capture is a core part of modern film, game, and app development. Attempts to make motion capture practical for amateur videographers have ranged from a $2,500 suit to a commercially available framework that leverages Microsoft's depth-sensing Kinect.


Local Motion Phases Technique Boosts Basketball Animation Richness and Realism

#artificialintelligence

Researchers from the University of Edinburgh School of Informatics and video game company Electronic Arts have proposed a novel framework that learns fast and dynamic character interactions. Trained on an unstructured basketball motion capture database, the model can animate multiple contacts between a player and the ball and other players and the environment. The team's modular and stable framework for data-driven character animation includes data processing, network training and runtime control; and was developed using Unity, Tensor flow, and PyTorch. The approach can perform complex and realistic animations of bipeds or quadrupeds engaged in sports and beyond. Enabling characters to perform a wide variety of dynamic fast-paced and quickly changing movements is a key challenge in character animation.


Generating Character Animations from Speech with AI - NVIDIA Developer News Center

#artificialintelligence

Researchers from the Max Planck Institute for Intelligent Systems, a member of NVIDIA's NVAIL program, developed an end-to-end deep learning algorithm that can take any speech signal as input – and realistically animate it in a wide range of adult faces. "There is an extensive literature on estimating 3D face shape, facial expressions, and facial motion from images and videos. Less attention has been paid to estimating 3D properties of faces from sound," the researchers stated in their paper. "Understanding the correlation between speech and facial motion thus provides additional valuable information for analyzing humans, particularly if visual data are noisy, missing, or ambiguous." The team first collected a new dataset of 4D face scans together with speech.


DeepDribble: Simulating Basketball with AI

#artificialintelligence

When training physically simulated characters basketball skills, these competing talents must also be held in balance. While AAA game titles like EA's NBA LIVE and NBA 2K have made drastic improvements to their graphics and character animation, basketball video games still rely heavily on canned animations. The industry is always looking for new methods for creating gripping, on-court action in a more personalized, interactive way. In a recent paper by DeepMotion Chief Scientist, Libin Liu, and Carnegie Mellon University Professor, Jessica Hodgins, virtual agents are trained to simulate a range of complex ball handling skills in real time. This blog gives an overview of their work and results, which will be presented at SIGGRAPH 2018.